Startups Study

This notebook tries to investigate the Crunchbase 2013 Snapshot dataset to extract information about which features distinguish the succesful startups from the failed ones. The dataset is composed of 11 CSV files:

- Objects: table with information about the companies, persons and products.
- Offices: information about physical offices of the companies.
- Relationships: information about the relationships between people and companies.
- Investments: connects the investor companies with the invested companies.
- Funding rounds: information about the funding rounds of the companies.
- Acquisitions: table with the information about the acquisition of companies from other comapnies.

Read the data files

Read the CSV files, drop unnecesary columns and convert date to datetypes. Clean the data in case its necessary

Split object table in companies, persons, financial organizations and products.

Create a table with just the companies which appear in the IPO table and add its funding rounds

Add number of employees to the company table and expand categorical columns

Exploratory Analysis

Plot the data to see how it is distributed, etc

Distribution of the status of the companies in the dataset. We can see that the vast mayority of the companies are still operating.

Let's explore the most popular categories of the companies in the dataset

And the countries with most companies startups:

Lets see the founding years of the different companies by category:

Now plot the physical offices of the companies. We can see that the USA and Europe are by far the most densed.

Plotting the number of employees for each company we see the mayority of companies have 50 or less employees. The companies with more employees are very succesful ones.

The correlation matrix shows the relationship of pairs of variables in the dataframe.

Some relations like invested_rounds/invested_companies are obvious and give no usefull information. However we can see how status_ipo is high related to variables like employees which could indicate some interesting information. Also milestones is high related to status_ipo and aqcuired.

Let's check the percentage of companies with the IPO status in each country.

A lot of countries have 0 companies with the IPO status and the high IPO rates are all from Asian developed countries.

Let's take a look at the countries with a lot of companies and 0 IPO companies.

Those countries with a lot of registered startups but not a single one in IPO could mean that those are countries where it is not easy to develop these kind of business at least in the public market.